13 research outputs found

    An infrastructure for Turkish prosody generation in text-to-speech synthesis

    Get PDF
    Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We propose a phonological phrase detection scheme based on syntactic analysis for Turkish and assign one of three intonation levels to words in detected phrases. Empirical observations on 100 sentences show that the proposed scheme works with approximately 85% accuracy

    Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

    Get PDF
    The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

    Robustness of Massively Parallel Sequencing Platforms

    Get PDF
    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications

    Pronunciation disambiguation in Turkish

    No full text
    In text-to-speech systems and in developing transcriptions for acoustic speech data, one is faced with the problem of disambiguating the pronunciation of a token in the context it is used, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. In this paper we investigate the problem of pronunciation disambiguation in Turkish as a natural language processing problem and present preliminary results using a morphological disambiguation technique based on the notion of distinguishing tag sets

    Comparisons of total and novel SNP and indel intersections of <i>B</i><sub>1</sub> vs. <i>T</i><sub>1</sub> and <i>B</i><sub>2</sub> vs. <i>T</i><sub>2</sub>. <i>B</i><sub>1</sub>, <i>T</i><sub>1</sub>:pooled <i>S</i><sub>1</sub> calls from BGI and TÜBİTAK datasets using HaplotypeCaller; <i>B</i><sub>2</sub>, <i>T</i><sub>2</sub>:pooled <i>S</i><sub>2</sub> calls from BGI and TÜBİTAK datasets, respectively.

    No full text
    <p>Comparisons of total and novel SNP and indel intersections of <i>B</i><sub>1</sub> vs. <i>T</i><sub>1</sub> and <i>B</i><sub>2</sub> vs. <i>T</i><sub>2</sub>. <i>B</i><sub>1</sub>, <i>T</i><sub>1</sub>:pooled <i>S</i><sub>1</sub> calls from BGI and TÜBİTAK datasets using HaplotypeCaller; <i>B</i><sub>2</sub>, <i>T</i><sub>2</sub>:pooled <i>S</i><sub>2</sub> calls from BGI and TÜBİTAK datasets, respectively.</p

    Summary of the sequence datasets.

    No full text
    <p>Basic statistics of the two samples (<i>S</i><sub>1</sub>, <i>S</i><sub>2</sub>) sequenced at two different centers. <i>S</i><sub>1<i>T</i></sub> refers to sample <i>S</i><sub>1</sub> sequenced at TÜBİTAK, where the dataset <i>S</i><sub>1<i>B</i></sub> was generated from the same sample at BGI. Similarly, datasets from sample <i>S</i><sub>2</sub> are denoted as <i>S</i><sub>2<i>T</i></sub> and <i>S</i><sub>2<i>B</i></sub>.</p><p>Summary of the sequence datasets.</p
    corecore